On-demand language model interpolation for mobile speech input
نویسندگان
چکیده
Google offers several speech features on the Android mobile operating system: search by voice, voice input to any text field, and an API for application developers. As a result, our speech recognition service must support a wide range of usage scenarios and speaking styles: relatively short search queries, addresses, business names, dictated SMS and e-mail messages, and a long tail of spoken input to any of the applications users may install. We present a method of on-demand language model interpolation in which contextual information about each utterance determines interpolation weights among a number of n-gram language models. On-demand interpolation results in an 11.2% relative reduction in WER compared to using a single language model to handle all traffic.
منابع مشابه
Maximum Entropy Language Model Adaptation for Mobile Speech Input
This paper describes unsupervised adaptation of language model for many related target domains. In mobile speech input, subject and vocabulary of the language depend highly on the usage context. We use automatically transcribed speech data to select a subset from the language model training data for building a maximum entropy model adapted to speech input. This model is further adapted for most...
متن کاملBayesian Language Model Interpolation for Mobile Speech Input
This paper explores various static interpolation methods for approximating a single dynamically-interpolated language model used for a variety of recognition tasks on the Google Android platform. The goal is to find the statically-interpolated firstpass LM that best reduces search errors in a two-pass system or that even allows eliminating the more complex dynamic second pass entirely. Static i...
متن کاملComparing the Impact of Audio-Visual Input Enhancement on Collocation Learning in Traditional and Mobile Learning Contexts
: This study investigated the impact of audio-visual input enhancement teaching techniques on improving English as Foreign Language (EFL) learnersˈ collocation learning as well as their accuracy concerning collocation use in narrative writing. In addition, it compared the impact and efficiency of audio-visual input enhancement in two learning contexts, namely traditional and mo...
متن کاملCombining language models in the input interface of a spoken dialogue system
This paper presents a new technique to enhance the performance of the input interface of spoken dialogue systems based on a procedure that combines during speech recognition the advantages of using prompt-dependent language models with those of using a language model independent of the prompts generated by the dialogue system. The technique proposes to create a new speech recognizer, termed con...
متن کاملWaveform Interpolation Speech Coder at 4 kb/s
Speech coding at bit rates near 4 kbps is expected to be widely deployed in applications such as visual telephony, mobile and personal communications. This research focuses on developing a speech coder based on the waveform interpolation (WI) scheme, with an attempt to deliver near toll-quality speech at rates around 4 kbps. A WI coder has been simulated in floating-point using the C programmin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010